Retrieving Knowledge from Technical, Manually Indexed Corpora
نویسندگان
چکیده
In technical elds, experts have manually indexed a huge collection of texts. For this purpose, they used thesauri, which structure sets of available keywords. Approaches in automatic indexing have made extensive use of thesauri. However, our belief is that automated systems do not wholly take into account the experts' knowledge. We thus present a method to extract that kind of knowledge from manually indexed technical corpora. We combine a linguistic analysis with a learning algorithm. This results in a set of indexing rules covering a domain speciied by a thesaurus.
منابع مشابه
Speech Recognition and Information Retrieval: Experiments in Retrieving Spoken Documents
The Informedia Digital Video Library Project at Carnegie Mellon University is making large corpora of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. Information retrieval of from corpora of speech recognition output is critical to the project’s success. In this paper, we out...
متن کاملGenerating Concise Rules for Retrieving Human Motions from Large Datasets
This paper proposes a method for retrieving human motion data with concise retrieval rules based on the spatio-temporal features of motion appearance. Our method first converts motion clip into a form of clausal language that represents geometrical relations between body parts and their temporal relationship. A retrieval rule is then learned from the set of manually classified examples using in...
متن کاملDomain Specific Sense Disambiguation with Unsupervised Methods
Most approaches in sense disambiguation have been restricted to supervised training over manually annotated, non-technical, English corpora. Application to a new language or technical domain requires extensive manual annotation of appropriate training corpora. As this is both expensive and inefficient, unsupervised methods are to be preferred, specifically in technical domains such as medicine....
متن کاملText-Image Interaction for Image Retrieval and Semi-Automatic Indexing
This paper addresses the issue of retrieving images based on visual content, according a particular attention to the semantic dimension of information retrieval. A brief review of existing Image Retrieval Systems is provided, hilighting a major drawback of these prototypes, namely the lack of integration between classical \semantic search", and visual similarity retrieval (i.e. content-based re...
متن کاملRetrieving Domain-Specific Collocations by Co-occurrences and Word Order Constraints
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously. This method is pr...
متن کامل